Programming Windows Azure : Table Operations - Using Partitioning

11/21/2010 11:34:47 AM

You were introduced to these mysterious properties called PartitionKey and RowKey, but you didn’t really learn much about them. To understand partitioning, it is useful to have a mental model of how the Azure Table service works. Azure tables give developers scalable storage, which means developers should be able to dump terabytes of data if necessary. All of this data must be naturally hosted on multiple machines. The question then becomes, “How do you partition data across these nodes?”

Partitioning has a few key implications. Picking the right partitioning is critical, or you could wind up with too much data on one node (bad), or related data on different nodes (really bad). Entities with the same partition key will share the same partition, and are guaranteed to be stored together. Partitioning is the unit of distribution, and is primarily meant for scalability.

This does not mean, however, that each partition is located on a separate node. The system automatically balances your partitions based on size, traffic, and other factors. For example, several partitions might start off together on the same node, but get moved away when one partition grows in size. In any case, you should never depend on separate partitions being together. On the other hand, you can always depend on entities within the same partition being together.

Can I Run Out of Space in One Partition?

When users hear that entities with the same partition key are stored together, they wonder whether they’ll run out of space when the actual physical machine holding the partition runs out of space. The answer is “no”—you cannot run out of space in a single partition. Without revealing some of the “secret sauce” behind the storage system, note that though terms such as node and partition are used here, they don’t necessarily mean “the same machine.” Some magic takes place under the covers to ensure that data in the same partition can be queried and retrieved together really, really quickly.

However, it might be useful to have a mental model of “one partition = one machine with a disk of near infinite capacity.” It makes visualization and whiteboard drawing much easier.

Partitioning (or, to be more precise, specifying the right partition key in the query) is the biggest factor affecting query performance. The general principle behind fast queries in any storage system is to structure your data and query in such a way that the storage system must do a minimal amount of traversal.

In the database world, this typically means using indexes. Without indexes to help the query processor find the right rows, every query would result in a slow table scan across all rows. The same principle holds true for Azure tables. You must partition your data and queries to make the storage system do the least amount of traversal possible. In general, you must make your queries as specific as possible.

Consider a simple table such as the one shown in Table 1.

Table 1. Superhero table
PartitionKey (Comic universe)	RowKey (Character name)	Property 3 (Superpower)	Property N (First appeared in)
Marvel	Cyclops	Heat Ray	The X-Men (#1)
Marvel	Wolverine	Healing + Adamantium Skeleton	The Incredible Hulk (#180)
DC	Superman	Flight, super-strength, and so on	Action Comics (#1)
DC	Batman	None	Detective Comics (#2)
DC	Lex Luthor	None	Action Comics (#24)
DC	Flash	Super speed	Flash Comics (#1)

Now, with the entries in Table 10-5 in mind, let’s walk through a few sample queries (specified in pseudocode) to see how partitioning can affect performance. Let’s assume that each partition is hosted on a separate storage node.

First, let’s find entities with the following pseudocode:

partition = "DC" and RowKey="Flash"

This is the fastest kind of query. In this case, both the partition key and the row key are specified. The system knows which partition to go to, and queries that single partition for the specified row.

Note:

Always try to specify the partition key in your queries. This helps query performance because the storage system knows exactly which node to query. When the partition key isn’t specified, the storage system must query all the partitions in the system, which obviously doesn’t result in as fast a result. Whether you can specify partition keys in all your keys depends on how you partition your data.

Next, let’s find entities with the following pseudocode:

PartitionKey="DC" and SuperPower=None

In this query, the partition key is specified, but a nonrow key attribute is filtered upon. This is fast (since the partition key is specified), but isn’t as fast as when the row key is specified.

Finally, let’s find entities with the following pseudocode:

SuperPower=None

This is the slowest kind of query. In this case, the storage system must query each of the table’s partitions, and then walk through each entity in the partition. You should avoid queries such as this that don’t specify any of the keys.

In a traditional RDBMS, you would specify an index to speed up such queries. However, Azure’s Table service doesn’t support these “secondary indexes.” (The row key is considered to be the primary index.) You can emulate the behavior of these secondary indexes yourself, though, by creating another table that maps these properties to the rows that contain them. You’ll see an example of how to do this later.

Note:

Secondary indexes are part of the road map for Azure tables, and you should see them in a future release. At that time, you won’t need these workarounds.

This approach has a few downsides. First, you’ll wind up storing more data in the system because of the extra tables. Second, you’ll be doing more I/O in the write/update code, which could affect performance.

You should keep a couple of considerations in mind that influence partitioning:

Ensuring locality of reference

In the previous query example, you saw how it is much faster to query only a single partition. Imagine a scenario in which your query must deal with different types of data. Ensuring that the data has the same partition key means the query can return results from just one partition.

Avoiding hot partitions

The storage system redistributes and load-balances traffic. However, queries and updates to a partition are served from the same partition. It might be wise to ensure that hot data is split across partitions to avoid putting a lot of stress on one node. In general, it’s not necessary for you to know whether to do this. Azure’s Table service can serve out data from a partition quickly, and can take quite a bit of load on one partition. This is a concern where applications have read-access rates that are very, very high. Running stress tests is a good way to identify whether your application needs this.

Note:

You can create as many partitions as you like. In fact, the more partitions you have, the better Azure’s Table service can spread out your data in case of heavy load. Like all things in life, this is a trade-off. Aggregate queries that span multiple partitions will see a drop in performance.

1. Picking the right partition key

Like a ritual, designing a database schema follows some set patterns. In short, you “model” the data you want to store, and then go about normalizing this schema. In the Windows Azure world, you start the same way, but you give a lot of importance to the queries that your application will be executing. In fact, it might be a good idea to begin with a list of queries that you know need good performance, and use that as the starting point to build out the table schema and partitioning scheme.

Follow these steps:

Start with the key queries that your system will execute. Prioritize them in order of importance and performance required. For example, a query to show the contents of your shopping cart must be much faster than a query to show a rarely generated report.
Using the previous key queries, create your table schema. Ensure that the partition key can be specified in performance-sensitive queries. Estimate how much data you expect in each table and each partition. If one partition winds up with too much data (for example, if it is an order of magnitude greater than any other partition), make your partitioning more granular by concatenating other properties into the partitioning key.
For example, if you’re building a web log analyzer and storing the URL as the partition key hurts you with very popular URLs, you can put date ranges in the partition key. This splits the data so that, for example, each partition contains data for a URL for only a specific day.
Pick a unique identifier for the RowKey. Row keys must be unique within the partition. For example, in Table 10-5, you used the Superhero’s name as the RowKey, since it was unique within the partition.

Of course, hindsight is 20/20. If you find that your partitioning scheme isn’t working well, you might need to change it on-the-fly. In the previous web log analyzer example, you could do that by making the size of the date range dynamic. If the data size on a particular day was huge (say, over the weekend), you could switch to using an hourly range only for weekends. Your application must be aware of this dynamic partitioning, and this should be built in from the start.

Why Have Both Partition Keys and Row Keys?

In general, partition keys are considered the unit of distribution/scalability, while row keys are meant for uniqueness. If the key for your data model has only one property, you should use that as your partition key (an empty row key would suffice) and have one row per partition. If your key has more than one property, distribute the properties between the partition key and the row key to get multiple rows per partition

2. Testing the theory

You’ve just seen the impact of specifying versus not specifying a partition key, or a query executing on one partition versus a query executing on multiple partitions. Now, let’s build some home-grown benchmarks to prove these points.

Warning:

These benchmarks were run from a network with multiple layers of proxies in the middle (and several hundred miles) between the machine and the cloud, whereas when you run in the cloud, you’ll be running in the same data center. Also, optimizations were not performed as they would have been in a production application. You should look at the relative difference between the following numbers, rather than the actual numbers themselves. Running the same unoptimized code in the cloud gives vastly different numbers—around 350 ms for retrieving 1,000 rows.

Example 1 shows a simple entity with a partition key and a row key (which doubles up as the Data member). You also write a vanilla DataContext to wrap around the entity. The entity isn’t interesting by itself. The interesting part is how you partition the data.

Example 1. Test entity

public class TestEntity : TableServiceEntity
    {

        public TestEntity(string id, string data)
            : base(id, data)
        {
            ID = id;
            Data = data;
        }

        //Parameter-less constructor always needed for
        // ADO.NET Data Services
        public TestEntity() { }

        public string ID { get; set; }
        public string Data { get; set; }
    }

  public class TestDataServiceContext : TableServiceContext
    {
        public TestDataServiceContext (string baseAddress,
        StorageCredentials credentials): base(baseAddress, credentials)
        {}

        internal const string TestTableName = "TestTable";

        public IQueryable<TestEntity> TestTable
        {
            get
            {
                return this.CreateQuery<TestEntity>(TestTableName);
            }
        }
    }

Though it is not shown in Example 10-18 , you also create an exact copy of these two classes with the number 2 appended to the type names (TestEntity2 and TestDataServiceContext2). You will try out two different partitioning schemes on TestEntity1 and TestEntity2.

For TestEntity, let’s insert 100,000 rows, as shown in Example 2 . Let’s create them all in the same partition (with partition key 1). The storage system will place all the entities on the same storage node.

Example 2. Inserting 100,000 rows into the same partition

CloudStorageAccount.Parse(ConfigurationSettings.AppSettings
["DataConnectionString"]);
var svc = new TestDataServiceContext(account.TableEndpoint.ToString(),
                                      account.Credentials);

for (int i = 1; i < 100000; i++)
   {
                svc.AddObject("TestTable",
                 new TestEntity("1", "RowKey_" + i.ToString() );
    }

For TestEntity2, let’s insert 100,000 rows, but let’s split them among 1,000 different partitions. You loop from 1 to 100,000 and modify the loop counter by 1,000 to get evenly spaced partitions. Example 3 shows how to do this.

Example 3. Inserting 100,000 rows in 1,000 partitions

CloudStorageAccount.Parse(ConfigurationSettings.AppSettings
["DataConnectionString"]);
var svc = new TestDataServiceContext(account.TableEndpoint.ToString(),
 account.Credentials);

for (int i = 1; i < 100000; i++)
            {
                svc.AddObject("TestTable2",
                new TestEntity2((i % 1000).ToString(), "RowKey_" + i.ToString()));
            }

Now, let’s run three different queries. The first query will be against the 100,000 rows of TestEntity that are in the same partition. The second will be against the 100,000 rows of TestEntity2, but with no partition key specified. The third will be the same as the second, but with the partition key specified in the query. Example 4 shows the code for the three.

Example 4. Three different queries

// Single partition query
var query = from entity in svc.CreateQuery<TestEntity>("TestTable")
where entity.PartitionKey == "1" && entity.RowKey == "RowKey_55000"
select entity;

//Multiple partition query - no partition key specified
var query2 = from entity in svc2.CreateQuery<TestEntity2>("TestTable2")
                         where entity.RowKey == "RowKey_55553"
                         select entity;

//Multiple partition query - partition key specified in query
var query3 = from entity in svc2.CreateQuery<TestEntity2>("TestTable2")
where entity.PartitionKey == "553" && entity.RowKey == "RowKey_55553"
select entity;

In each of these queries, let’s retrieve one entity using the FirstOrDefault method. Table 2 shows the relative numbers for 1,000 iterations of each of these queries.

Table 2. Query performance comparison
Query type	Time for 1,000 iterations (in seconds)
Single partition	26
Multiple partition—no partition key specified	453
Multiple partition—partition key specified	25

The results speak for themselves. Going to a single partition (either because all your data is stored in it or because you specified it in the query) is always much faster than not specifying the partition key. Of course, using only a single partition has several downsides, as discussed earlier. In general, query times are not affected by how they are partitioned as much as they are by whether the partition key is specified in the query.

Warning:

If you want to do similar tests, insert the following configuration setting into your configuration file (either App.config or web.config):

  <system.net>
    <settings>
      <servicePointManager expect100Continue="false"
useNagleAlgorithm="false" />
    </settings>
  </system.net>

The first configuration setting deals with a bug in .NET where every request is sent with an Expect:100-Continue. If you’re sure that your client handles errors from the server well, you can turn this off.

The second configuration setting is an issue if you do several synchronous updates close together like this benchmark program does. Since Delayed ACKs are turned on in the server, the client winds up waiting for much longer than it should when the Nagle algorithm is turned on.